Copyright © 2007 Pearson Addison-Wesley. All rights reserved. A. Levitin Introduction to the Design & Analysis of Algorithms,2nd ed., Ch. 1
CSE408
Fundamentals of
Algorithms
Lecture #1
Copyright © 2007 Pearson Addison-Wesley. All rights reserved. A. Levitin Introduction to the Design & Analysis of Algorithms,2nd ed., Ch. 1
What is an algorithm?
An algorithm is a sequence of unambiguous instructions
for solving a problem, i.e., for obtaining a required
output for any legitimate input in a finite amount of time.
“computer”
problem
algorithm
input output
Copyright © 2007 Pearson Addison-Wesley. All rights reserved. A. Levitin Introduction to the Design & Analysis of Algorithms,2nd ed., Ch. 1
Algorithm
An algorithm is a sequence of unambiguous
instructions for solving a problem, i.e., for obtaining a
required output for any legitimate input in a finite
amount of time.
Can be represented various forms
Unambiguity/clearness
Effectiveness
Finiteness/termination
Correctness
Copyright © 2007 Pearson Addison-Wesley. All rights reserved. A. Levitin Introduction to the Design & Analysis of Algorithms,2nd ed., Ch. 1
Historical Perspective
Euclid’s algorithm for finding the greatest common divisor
Muhammad ibn Musa al-Khwarizmi 9th century
mathematician
www.lib.virginia.edu/science/parshall/khwariz.html
Copyright © 2007 Pearson Addison-Wesley. All rights reserved. A. Levitin Introduction to the Design & Analysis of Algorithms,2nd ed., Ch. 1
Notion of algorithm and problem
“computer”
algorithmic solution
(different from a conventional solution)
problem
algorithm
input
(or instance) output
Copyright © 2007 Pearson Addison-Wesley. All rights reserved. A. Levitin Introduction to the Design & Analysis of Algorithms,2nd ed., Ch. 1
Example of computational problem: sorting
Statement of problem:
Input: A sequence of nnumbers <a1, a2, …, an>
Output: A reordering of the input sequence <a´1, a´2, …, a´n> so that a´i
a´jwhenever i< j
Instance: The sequence <5, 3, 2, 8, 3>
Algorithms:
Selection sort
Insertion sort
Merge sort
(many others)
Copyright © 2007 Pearson Addison-Wesley. All rights reserved. A. Levitin Introduction to the Design & Analysis of Algorithms,2nd ed., Ch. 1
Selection Sort
Input: array a[1],…,a[n]
Output: array asorted in non-decreasing order
Algorithm:
for
i
=1 to
n
swap a[
i
]with smallest of a[
i
],…,a[
n
]
Is this unambiguous? Effective?
See also pseudocode, section 3.1
Copyright © 2007 Pearson Addison-Wesley. All rights reserved. A. Levitin Introduction to the Design & Analysis of Algorithms,2nd ed., Ch. 1
Some Well-known Computational Problems
Sorting
Searching
Shortest paths in a graph
Minimum spanning tree
Primality testing
Traveling salesman problem
Knapsack problem
Chess
Towers of Hanoi
Program termination
Some of these problems don’t have efficient algorithms,
or algorithms at all!
Copyright © 2007 Pearson Addison-Wesley. All rights reserved. A. Levitin Introduction to the Design & Analysis of Algorithms,2nd ed., Ch. 1
Basic Issues Related to Algorithms
How to design algorithms
How to express algorithms
Proving correctness
Efficiency (or complexity) analysis
Theoretical analysis
Empirical analysis
Optimality
Copyright © 2007 Pearson Addison-Wesley. All rights reserved. A. Levitin Introduction to the Design & Analysis of Algorithms,2nd ed., Ch. 1
Algorithm design strategies
Brute force
Divide and conquer
Decrease and conquer
Transform and conquer
Greedy approach
Dynamic programming
Backtracking and branch-and-bound
Space and time tradeoffs
Copyright © 2007 Pearson Addison-Wesley. All rights reserved. A. Levitin Introduction to the Design & Analysis of Algorithms,2nd ed., Ch. 1
Analysis of Algorithms
How good is the algorithm?
Correctness
Time efficiency
Space efficiency
Does there exist a better algorithm?
Lower bounds
Optimality
Copyright © 2007 Pearson Addison-Wesley. All rights reserved. A. Levitin Introduction to the Design & Analysis of Algorithms,2nd ed., Ch. 1
What is an algorithm?
Recipe, process, method, technique, procedure, routine,…
with the following requirements:
1. Finiteness
terminates after a finite number of steps
2. Definiteness
rigorously and unambiguously specified
3. Clearly specified input
valid inputs are clearly specified
4. Clearly specified/expected output
can be proved to produce the correct output given a valid input
5. Effectiveness
steps are sufficiently simple and basic
Copyright © 2007 Pearson Addison-Wesley. All rights reserved. A. Levitin Introduction to the Design & Analysis of Algorithms,2nd ed., Ch. 1
Why study algorithms?
Theoretical importance
the core of computer science
Practical importance
A practitioner’s toolkit of known algorithms
Framework for designing and analyzing algorithms for new problems
Example: Google’s PageRank Technology
Copyright © 2007 Pearson Addison-Wesley. All rights reserved. A. Levitin Introduction to the Design & Analysis of Algorithms,2nd ed., Ch. 1
Euclid’s Algorithm
Problem: Find gcd(m,n), the greatest common divisor of two
nonnegative, not both zero integers m and n
Examples: gcd(60,24) = 12, gcd(60,0) = 60, gcd(0,0) = ?
Euclid’s algorithm is based on repeated application of equality
gcd(m,n) = gcd(n, m mod n)
until the second number becomes 0, which makes the problem
trivial.
Example: gcd(60,24) = gcd(24,12) = gcd(12,0) = 12
Copyright © 2007 Pearson Addison-Wesley. All rights reserved. A. Levitin Introduction to the Design & Analysis of Algorithms,2nd ed., Ch. 1
Two descriptions of Euclid’s algorithm
Step 1 If n= 0, return mand stop; otherwise go to Step 2
Step 2 Divide mby n and assign the value of the remainder to r
Step 3 Assign the value of n to mand the value of rto n. Go to
Step 1.
while n≠ 0 do
r ← m mod n
m← n
n ← r
return m
Copyright © 2007 Pearson Addison-Wesley. All rights reserved. A. Levitin Introduction to the Design & Analysis of Algorithms,2nd ed., Ch. 1
Other methods for computing gcd(m,n)
Consecutive integer checking algorithm
Step 1 Assign the value of min{m,n} to t
Step 2 Divide mby t. If the remainder is 0, go to Step 3;
otherwise, go to Step 4
Step 3 Divide nby t. If the remainder is 0, return tand stop;
otherwise, go to Step 4
Step 4 Decrease t by 1 and go to Step 2
Is this slower than Euclid’s algorithm?
How much slower?
O(n), if n <= m , vs O(log n)
Copyright © 2007 Pearson Addison-Wesley. All rights reserved. A. Levitin Introduction to the Design & Analysis of Algorithms,2nd ed., Ch. 1
Other methods for gcd(m,n) [cont.]
Middle-school procedure
Step 1 Find the prime factorization of m
Step 2 Find the prime factorization of n
Step 3 Find all the common prime factors
Step 4 Compute the product of all the common prime factors
and return it as gcd(m,n)
Is this an algorithm?
How efficient is it?
Time complexity: O(sqrt(n))
Copyright © 2007 Pearson Addison-Wesley. All rights reserved. A. Levitin Introduction to the Design & Analysis of Algorithms,2nd ed., Ch. 1
Sieve of Eratosthenes
Input: Integer n 2
Output: List of primes less than or equal to n
for p ← 2 to ndo A[p] ← p
for p ← 2 to ndo
if A[p] 0 //p hasn’t been previously eliminated from the list
j p*p
while j n do
A[j] ← 0 //mark element as eliminated
j j + p
Example: 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Time complexity: O(n)
Copyright © 2007 Pearson Addison-Wesley. All rights reserved. A. Levitin Introduction to the Design & Analysis of Algorithms,2nd ed., Ch. 1
Fundamentals of Algorithmic Problem Solving
Copyright © 2007 Pearson Addison-Wesley. All rights reserved. A. Levitin Introduction to the Design & Analysis of Algorithms,2nd ed., Ch. 1
Two main issues related to algorithms
How to design algorithms
How to analyze algorithm efficiency
Copyright © 2007 Pearson Addison-Wesley. All rights reserved. A. Levitin Introduction to the Design & Analysis of Algorithms,2nd ed., Ch. 1
Algorithm design techniques/strategies
Brute force
Divide and conquer
Decrease and conquer
Transform and conquer
Space and time tradeoffs
Greedy approach
Dynamic programming
Iterative improvement
Backtracking
Branch and bound
Copyright © 2007 Pearson Addison-Wesley. All rights reserved. A. Levitin Introduction to the Design & Analysis of Algorithms,2nd ed., Ch. 1
Analysis of algorithms
How good is the algorithm?
time efficiency
space efficiency
correctness ignored in this course
Does there exist a better algorithm?
lower bounds
optimality
Copyright © 2007 Pearson Addison-Wesley. All rights reserved. A. Levitin Introduction to the Design & Analysis of Algorithms,2nd ed., Ch. 1
Important problem types
sorting
searching
string processing
graph problems
combinatorial problems
geometric problems
numerical problems
Copyright © 2007 Pearson Addison-Wesley. All rights reserved. A. Levitin Introduction to the Design & Analysis of Algorithms,2nd ed., Ch. 1
Sorting (I)
Rearrange the items of a given list in ascending order.
Input: A sequence of n numbers <a1, a2, …, an>
Output: A reordering <a´1, a´2, …, a´n> of the input sequence such that a´1
a´2 a´n.
Why sorting?
Help searching
Algorithms often use sorting as a key subroutine.
Sorting key
A specially chosen piece of information used to guide sorting. E.g., sort
student records by names.
Copyright © 2007 Pearson Addison-Wesley. All rights reserved. A. Levitin Introduction to the Design & Analysis of Algorithms,2nd ed., Ch. 1
Sorting (II)
Examples of sorting algorithms
Selection sort
Bubble sort
Insertion sort
Merge sort
Heap sort …
Evaluate sorting algorithm complexity: the number of key comparisons.
Two properties
Stability: A sorting algorithm is called stable if it preserves the relative order of any
two equal elements in its input.
In place : A sorting algorithm is in place if it does not require extra memory,
except, possibly for a few memory units.
Copyright © 2007 Pearson Addison-Wesley. All rights reserved. A. Levitin Introduction to the Design & Analysis of Algorithms,2nd ed., Ch. 1
Selection Sort
Algorithm SelectionSort(A[0..n-1])
//The algorithm sorts a given array by selection sort
//Input: An array A[0..n-1] of orderable elements
//Output: Array A[0..n-1] sorted in ascending order
for i 0 to n 2 do
min i
for j i + 1 to n 1 do
if A[j] < A[min]
min j
swap A[i] and A[min]
Copyright © 2007 Pearson Addison-Wesley. All rights reserved. A. Levitin Introduction to the Design & Analysis of Algorithms,2nd ed., Ch. 1
Searching
Find a given value, called a search key, in a given set.
Examples of searching algorithms
Sequential search
Binary search …
Input: sorted array a_i < … < a_j and key x;
m (i+j)/2;
while i < j and x != a_m do
if x < a_m then j m-1
else i m+1;
if x = a_m then output a_m;
Time: O(log n)
Copyright © 2007 Pearson Addison-Wesley. All rights reserved. A. Levitin Introduction to the Design & Analysis of Algorithms,2nd ed., Ch. 1
String Processing
A string is a sequence of characters from an alphabet.
Text strings: letters, numbers, and special characters.
String matching: searching for a given word/pattern in a text.
Examples:
(i) searching for a word or phrase on WWW or in a
Word document
(ii) searching for a short read in the reference genomic
sequence
Copyright © 2007 Pearson Addison-Wesley. All rights reserved. A. Levitin Introduction to the Design & Analysis of Algorithms,2nd ed., Ch. 1
Copyright © 2007 Pearson Addison-Wesley. All rights reserved. A. Levitin Introduction to the Design & Analysis of Algorithms, 2nd ed., Ch. 1
CSE408
Fundamentals of Data
Structure
Lecture #2
Copyright © 2007 Pearson Addison-Wesley. All rights reserved. A. Levitin Introduction to the Design & Analysis of Algorithms, 2nd ed., Ch. 1
Fundamental data structures
list
array
linked list
string
stack
queue
priority queue/heap
graph
tree and binary tree
set and dictionary
Copyright © 2007 Pearson Addison-Wesley. All rights reserved. A. Levitin Introduction to the Design & Analysis of Algorithms, 2nd ed., Ch. 1
Linear Data Structures
Arrays
A sequence of n items of the same data
type that are stored contiguously in
computer memory and made accessible
by specifying a value of the array’s
index.
Linked List
A sequence of zero or more nodes each
containing two kinds of information:
some data and one or more links called
pointers to other nodes of the linked
list.
Singly linked list (next pointer)
Doubly linked list (next + previous
pointers)
Arrays
fixed length (need preliminary
reservation of memory)
contiguous memory locations
direct access
Insert/delete
Linked Lists
dynamic length
arbitrary memory locations
access by following links
Insert/delete
a1 ana2 .
Copyright © 2007 Pearson Addison-Wesley. All rights reserved. A. Levitin Introduction to the Design & Analysis of Algorithms, 2nd ed., Ch. 1
Stacks and Queues
Stacks
A stack of plates
insertion/deletion can be done only at the top.
LIFO
Two operations (push and pop)
Queues
A queue of customers waiting for services
Insertion/enqueue from the rear and deletion/dequeue from the
front.
FIFO
Two operations (enqueue and dequeue)
Copyright © 2007 Pearson Addison-Wesley. All rights reserved. A. Levitin Introduction to the Design & Analysis of Algorithms, 2nd ed., Ch. 1
Priority Queue and Heap
Priority queues (implemented using heaps)
A data structure for maintaining a set of elements,
each associated with a key/priority, with the
following operations
Finding the element with the highest priority
Deleting the element with the highest priority
Inserting a new element
Scheduling jobs on a shared computer
9
6 8
5 2 3
9 6 58 2 3
Copyright © 2007 Pearson Addison-Wesley. All rights reserved. A. Levitin Introduction to the Design & Analysis of Algorithms, 2nd ed., Ch. 1
Graphs
Formal definition
A graph G = <V, E> is defined by a pair of two sets: a
finite set V of items called vertices and a set E of vertex
pairs called edges.
Undirected and directed graphs (digraphs).
What’s the maximum number of edges in an undirected graph
with |V| vertices?
Complete, dense, and sparse graphs
A graph with every pair of its vertices connected by an edge
is called complete, K|V|
1 2
3 4
Copyright © 2007 Pearson Addison-Wesley. All rights reserved. A. Levitin Introduction to the Design & Analysis of Algorithms, 2nd ed., Ch. 1
Graph Representation
Adjacency matrix
n x n boolean matrix if |V| is n.
The element on the ith row and jth column is 1 if there’s an edge
from ith vertex to the jth vertex; otherwise 0.
The adjacency matrix of an undirected graph is symmetric.
Adjacency linked lists
A collection of linked lists, one for each vertex, that contain all the
vertices adjacent to the list’s vertex.
Which data structure would you use if the graph is a 100-node star
shape?
0 1 1 1
0 0 0 1
0 0 0 1
0 0 0 0
2 3 4
4
4
Copyright © 2007 Pearson Addison-Wesley. All rights reserved. A. Levitin Introduction to the Design & Analysis of Algorithms, 2nd ed., Ch. 1
Weighted Graphs
Weighted graphs
Graphs or digraphs with numbers assigned to the edges.
1 2
3 4
6
8
5
7
9
Copyright © 2007 Pearson Addison-Wesley. All rights reserved. A. Levitin Introduction to the Design & Analysis of Algorithms, 2nd ed., Ch. 1
Graph Properties -- Paths and Connectivity
Paths
A path from vertex u to v of a graph G is defined as a sequence of
adjacent (connected by an edge) vertices that starts with u and ends with
v.
Simple paths: All edges of a path are distinct.
Path lengths: the number of edges, or the number of vertices 1.
Connected graphs
A graph is said to be connected if for every pair of its vertices u and v
there is a path from u to v.
Connected component
The maximum connected subgraph of a given graph.
Copyright © 2007 Pearson Addison-Wesley. All rights reserved. A. Levitin Introduction to the Design & Analysis of Algorithms, 2nd ed., Ch. 1
Graph Properties -- Acyclicity
Cycle
A simple path of a positive length that starts and ends a
the same vertex.
Acyclic graph
A graph without cycles
DAG (Directed Acyclic Graph)
1 2
3 4
Copyright © 2007 Pearson Addison-Wesley. All rights reserved. A. Levitin Introduction to the Design & Analysis of Algorithms, 2nd ed., Ch. 1
Trees
Trees
A tree (or free tree) is a connected acyclic graph.
Forest: a graph that has no cycles but is not necessarily connected.
Properties of trees
For every two vertices in a tree there always exists exactly one simple
path from one of these vertices to the other. Why?
Rooted trees: The above property makes it possible to select an
arbitrary vertex in a free tree and consider it as the root of the so
called rooted tree.
Levels in a rooted tree.
|E| = |V| - 1 1 3
2 4
51
3
2
4 5
rooted
Copyright © 2007 Pearson Addison-Wesley. All rights reserved. A. Levitin Introduction to the Design & Analysis of Algorithms, 2nd ed., Ch. 1
Rooted Trees (I)
Ancestors
For any vertex v in a tree T, all the vertices on the simple path
from the root to that vertex are called ancestors.
Descendants
All the vertices for which a vertex v is an ancestor are said to be
descendants of v.
Parent, child and siblings
If (u, v) is the last edge of the simple path from the root to
vertex v, u is said to be the parent of v and v is called a child of
u.
Vertices that have the same parent are called siblings.
Leaves
A vertex without children is called a leaf.
Subtree
A vertex v with all its descendants is called the subtree of T
rooted at v.
Copyright © 2007 Pearson Addison-Wesley. All rights reserved. A. Levitin Introduction to the Design & Analysis of Algorithms, 2nd ed., Ch. 1
Rooted Trees (II)
Depth of a vertex
The length of the simple path from the root to the vertex.
Height of a tree
The length of the longest simple path from the root to a leaf.
1
3
2
4 5
h = 2
Copyright © 2007 Pearson Addison-Wesley. All rights reserved. A. Levitin Introduction to the Design & Analysis of Algorithms, 2nd ed., Ch. 1
Ordered Trees
Ordered trees
An ordered tree is a rooted tree in which all the children of each vertex
are ordered.
Binary trees
A binary tree is an ordered tree in which every vertex has no more than
two children and each children is designated s either a left child or a right
child of its parent.
Binary search trees
Each vertex is assigned a number.
A number assigned to each parental vertex is larger than all the numbers
in its left subtree and smaller than all the numbers in its right subtree.
log2n h n 1, where h is the height of a binary tree and n the size.
9
6 8
5 2 3
6
3 9
2 5 8
Copyright © 2007 Pearson Addison-Wesley. All rights reserved. A. Levitin Introduction to the Design & Analysis of Algorithms, 2nd ed., Ch. 1
CSE408
String Matching
Algorithm
Lecture # 5&6
String Matching Problem
Motivations: text-editing, pattern matching in DNA sequences
Text: array T[1...n] Pattern: array P[1...m]
Array Element: Character from finite alphabet S
Pattern P occurs with shift s in T if P[1...m] = T[s+1...s+m]
mns 0
mn
32.1
String Matching Algorithms
Naive Algorithm
Worst-case running time in O((n-m+1) m)
Rabin-Karp
Worst-case running time in O((n-m+1) m)
Better than this on average and in practice
Knuth-Morris-Pratt
Worst-case running time in O(n + m)
Notation & Terminology
S* = set of all finite-length strings formed
using characters from alphabet S
Empty string: e
|x| = length of string x
w is a prefix of x: w x
w is a suffix of x: w x
prefix, suffix are transitive
ab abcca
cca abcca
Naive String Matching
worst-case running time is in Q((n-m+1)m)
32.4
Rabin-Karp Algorithm
Assume each character is digit in radix-d notation (e.g. d=10)
p = decimal value of pattern
ts = decimal value of substring T[s+1..s+m] for s = 0,1...,n-m
Strategy:
compute p in O(m) time (which is in O(n))
compute all ti values in total of O(n) time
find all valid shifts s in O(n) time by comparing p with each ts
Compute p in O(m) time using Horners rule:
p = P[m] + d(P[m-1] + d(P[m-2] + ... + d(P[2] + dP[1])))
Compute t0 similarly from T[1..m] in O(m) time
Compute remaining ti‘s in O(n-m) time
ts+1 = d(ts - d m-1T[s+1]) + T[s+m+1]
Rabin-Karp Algorithm
p, ts may be large, so use mod
32.5
Rabin-Karp Algorithm (continued)
p = 31415
spurious
hit
ts+1 = d(ts - d m-1T[s+1]) + T[s+m+1]
Rabin-Karp Algorithm (continued)
Rabin-Karp Algorithm (continued)
worst-case running time is in Q((n-m+1)m)
Q(m) in Q(n)
Q(m)
Q(m)
Q((n-m+1)m)
high-order digit position for m-digit window
Matching loop invariant: when line 10 executed
ts=T[s+1..s+m] mod q
rule out spurious hit
Try all
possible
shifts
d is radix q is modulus
Preprocessing
Rabin-Karp Algorithm (continued)
average-case running time is in O(n+m)
Assume reducing mod q is like random mapping from
S
* to Zq
Estimate (chance that ts= p mod q) = 1/q # spurious hits is in O(n/q)
Q(m) in Q(n)
Q(m)
Q(m)
Q((n-m+1)m)
high-order digit position for m-digit window
Matching loop invariant: when line 10 executed
ts=T[s+1..s+m] mod q
rule out spurious hit
Try all
possible
shifts
d is radix q is modulus
Preprocessing
Expected matching time = O(n) + O(m(v + n/q)) (v = # valid shifts)
If v is in O(1) and q >= m
The Knuth-Morris-Pratt Algorithm
Knuth, Morris and Pratt proposed a linear time
algorithm for the string matching problem.
A matching time of O(n) is achieved by avoiding
comparisons with elements of ‘S’ that have
previously been involved in comparison with
some element of the pattern ‘p’ to be
matched. i.e., backtracking on the string ‘S’
never occurs
Components of KMP algorithm
The prefix function, Π
The prefix function,Π for a pattern encapsulates
knowledge about how the pattern matches against
shifts of itself. This information can be used to avoid
useless shifts of the pattern ‘p’. In other words, this
enables avoiding backtracking on the string ‘S’.
The KMP Matcher
With string ‘S’, pattern ‘p’ and prefix function ‘Π’ as
inputs, finds the occurrence of ‘p’ in ‘S’ and returns
the number of shifts of ‘p’ after which occurrence is
found.
The prefix function, Π
Following pseudocode computes the prefix fucnction, Π:
Compute-Prefix-Function (p)
1 m length[p] //’p’ pattern to be matched
2 Π[1] 0
3 k 0
4 for q 2 to m
5 do while k > 0 and p[k+1] != p[q]
6 do k Π[k]
7 If p[k+1] = p[q]
8 then k k +1
9 Π[q] k
10 return Π
Example: compute Π for the pattern ‘p’ below:
Pa b a b a c a
Initially: m = length[p] = 7
Π[1] = 0
k = 0
Step 1: q = 2, k=0
Π[2] = 0
Step 2: q = 3, k = 0,
Π[3] = 1
q 1 2 3 4 5 6 7
p a b a b a c a
Π0 0
q 1 2 3 4 5 6 7
p a b a b a c a
Π0 0 1
q 1 2 3 4 5 6 7
p a b a b a c A
Π0 0 1 2
Step 4: q = 5, k =2
Π[5] = 3
Step 5: q = 6, k = 3
Π[6] = 1
Step 6: q = 7, k = 1
Π[7] = 1
After iterating 6 times, the prefix
function computation is
complete:
q 1 2 3 4 5 6 7
p a b a b a c a
Π0 0 1 2 3
q 1 2 3 4 5 6 7
p a b a b a c a
Π0 0 1 2 3 1
q 1 2 3 4 5 6 7
p a b a b a c a
Π0 0 1 2 3 1 1
q 1 2 3 4 5 6 7
p a b A b a c a
Π0 0 1 2 3 1 1
The KMP Matcher
The KMP Matcher, with pattern ‘p’, string ‘S’ and prefix function ‘Π’ as input, finds a match of p in S.
Following pseudocode computes the matching component of KMP algorithm:
KMP-Matcher(S,p)
1 n length[S]
2 m length[p]
3 Π Compute-Prefix-Function(p)
4 q 0 //number of characters matched
5 for i 1 to n //scan S from left to right
6 do while q > 0 and p[q+1] != S[i]
7 do q Π[q] //next character does not match
8 if p[q+1] = S[i]
9 then q q + 1 //next character matches
10 if q = m //is all of p matched?
11 then print “Pattern occurs with shift” i – m
12 q Π[ q] // look for the next match
Note: KMP finds every occurrence of a ‘p’ in ‘S. That is why KMP does not terminate in step 12, rather it
searches remainder of ‘S’ for any more occurrences of ‘p’.
Illustration: given a String ‘S’ and pattern ‘p’ as follows:
S
b a c b a b a b a b a c a c a
pa b a b a c a
Let us execute the KMP algorithm to find
whether ‘p’ occurs in ‘S’.
For ‘p’ the prefix function, Π was computed previously and is as follows:
q 1 2 3 4 5 6 7
p a b A b a c a
Π0 0 1 2 3 1 1
b a c b a b a b a b a c a a b
b a c b a b a b a b a c a a b
a b a b a c a
Initially: n = size of S = 15;
m = size of p = 7
Step 1: i = 1, q = 0
comparing p[1] with S[1]
S
p
P[1] does not match with S[1]. ‘p’ will be shifted one position to the right.
S
pa b a b a c a
Step 2: i = 2, q = 0
comparing p[1] with S[2]
P[1] matches S[2]. Since there is a match, p is not shifted.
Step 3: i = 3, q = 1
b a c b a b a b a b a c a a b
Comparing p[2] with S[3]
S
a b a b a c a
b a c b a b a b a b a c a a b
b a c b a b a b a b a c a a b
a b a b a c a
a b a b a c a
p
S
p
S
p
p[2] does not match with S[3]
Backtracking on p, comparing p[1] and S[3]
Step 4: i = 4, q = 0
comparing p[1] with S[4] p[1] does not match with S[4]
Step 5: i = 5, q = 0
comparing p[1] with S[5] p[1] matches with S[5]
bacbabababacaab
b a c b a b a b a b a c a a b
b a c b a b a b a b a c a a b
a b a b a c a
a b a b a c a
a b a b a c a
Step 6: i = 6, q = 1
S
p
Comparing p[2] with S[6] p[2] matches with S[6]
S
p
Step 7: i = 7, q = 2
Comparing p[3] with S[7] p[3] matches with S[7]
Step 8: i = 8, q = 3
Comparing p[4] with S[8] p[4] matches with S[8]
S
p
Step 9: i = 9, q = 4
Comparing p[5] with S[9]
Comparing p[6] with S[10]
Comparing p[5] with S[11]
Step 10: i = 10, q = 5
Step 11: i = 11, q = 4
S
S
S
p
p
p
b a c b a b a b a b a c a a b
b a c b a b a b a b a c a a b
b a c b a b a b a b a c a a b
a b a b a c a
a b a b a c a
a b a b a c a
p[6] doesn’t match with S[10]
Backtracking on p, comparing p[4] with S[10] because after mismatch q = Π[5] = 3
p[5] matches with S[9]
p[5] matches with S[11]
b a c b a b a b a b a c a a b
b a c b a b a b a b a c a a b
a b a b a c a
a b a b a c a
Step 12: i = 12, q = 5
Comparing p[6] with S[12]
Comparing p[7] with S[13]
S
S
p
p
Step 13: i = 13, q = 6
p[6] matches with S[12]
p[7] matches with S[13]
Pattern ‘p’ has been found to completely occur in string ‘S’. The total number of shifts
that took place for the match to be found are: i m = 13 7 = 6 shifts.
Running - time analysis
Compute-Prefix-Function (Π)
1 m length[p] //’p’ pattern to be matched
2 Π[1] 0
3 k 0
4 for q 2 to m
5 do while k > 0 and p[k+1] != p[q]
6 do k Π[k]
7 If p[k+1] = p[q]
8 then k k +1
9 Π[q] k
10 return Π
In the above pseudocode for computing the prefix
function, the for loop from step 4 to step 10
runs ‘m’ times. Step 1 to step 3 take
constant time. Hence the running time of
compute prefix function is Θ(m).
KMP Matcher
1 n length[S]
2 m length[p]
3 Π Compute-Prefix-Function(p)
4 q 0
5 for i 1 to n
6 do while q > 0 and p[q+1] != S[i]
7 do q Π[q]
8 if p[q+1] = S[i]
9 then q q + 1
10 if q = m
11 then print “Pattern occurs with shift” i – m
12 q Π[ q]
The for loop beginning in step 5 runs ‘n’ times, i.e., as
long as the length of the string ‘S. Since step 1
to step 4 take constant time, the running time is
dominated by this for loop. Thus running time of
matching function is Θ(n).
Knuth-Morris-Pratt Algorithm
Q(m) in Q(n)
using
amortized
analysis
# characters matched
scan text left-to-right
next character does not match
next character matches
Is all of P matched?
Look for next match
Q(m+n)
using
amortized
analysis
Q(n)
Knuth-Morris-Pratt Algorithm
Amortized Analysis
kk = )(
Potential Method
k = current state of algorithm
Q(m)
in
Q(n)
initial potential value
potential decreases
Potential is never negative
since p (k) >= 0 for all k
potential
increases by
<=1 in each
execution of
for loop body
amortized
cost of loop
body is in
O(1)
Q(m) loop
iterations
CSE408
Brute Force(String Matching,
Closest pair, Convex
hull,Exhaustive,Voronori
diagrams
Lecture # 7&8
Brute Force
A straightforward approach, usually based directly on the
problem’s statement and definitions of the concepts
involved
Examples:
1. Computing an (a > 0, n a nonnegative integer)
2. Computing n!
3. Multiplying two matrices
4. Searching for a key of a given value in a list
Brute-Force Sorting Algorithm
Selection Sort Scan the array to find its smallest element
and swap it with the first element. Then, starting with
the second element, scan the elements to the right of it
to find the smallest among them and swap it with the
second elements. Generally, on pass i (0 i n-2), find
the smallest element in A[i..n-1] and swap it with A[i]:
A[0] . . . A[i-1] | A[i], . . . , A[min], . . .,
A[n-1]
in their final positions
Example: 7 3 2 5
Analysis of Selection Sort
Time efficiency:
Space efficiency:
Stability:
Θ(n^2)
Θ(1), so in place
yes
Brute-Force String Matching
pattern: a string of m characters to search for
text: a (longer) string of n characters to search in
problem: find a substring in the text that matches the pattern
Brute-force algorithm
Step 1 Align pattern at beginning of text
Step 2 Moving from left to right, compare each character of
pattern to the corresponding character in text until
all characters are found to match (successful search); or
a mismatch is detected
Step 3 While pattern is not found and the text is not yet
exhausted, realign pattern one position to the right and
repeat Step 2
Examples of Brute-Force String Matching
1. Pattern: 001011
Text: 10010101101001100101111010
2. Pattern: happy
Text: It is never too late to have a
happy childhood.
Pseudocode and Efficiency
Time efficiency: Θ(mn) comparisons (in the worst case)
Why?
Brute-Force Polynomial Evaluation
Problem: Find the value of polynomial
p(x) = anxn + an-1xn-1 +… + a1x1 + a0
at a point x = x0
Brute-force algorithm
Efficiency:
p 0.0
for i n downto 0 do
power 1
for j 1 to i do //compute xi
power power x
p p + a[i] power
return p
0in i = Θ(n^2) multiplications
Polynomial Evaluation: Improvement
We can do better by evaluating from right to left:
Better brute-force algorithm
Efficiency:
Horners Rule is another linear time method.
p a[0]
power 1
for i 1 to n do
power power x
p p + a[i] power
return p
Θ(n) multiplications
Closest-Pair Problem
Find the two closest points in a set of n points
(in the two-dimensional Cartesian plane).
Brute-force algorithm
Compute the distance between every pair of
distinct points
and return the indexes of the points for which
the distance is the smallest.
Closest-Pair Brute-Force Algorithm (cont.)
Efficiency:
How to make it faster?
Θ(n^2) multiplications (or sqrt)
Using divide-and-conquer!
Convex Hull Problem
The convex-hull problem is the problem of
constructing the convex hull for a given set S of n
points
To solve it, we need to find the points that will
serve as the vertices of the polygon in question.
Mathematicians call the vertices of such a
polygon extreme points.
By definition, an extreme point of a convex set is
a point of this set that is not a middle point of
any line segment with endpoints in the set.
how can we solve the convex-hull problem in a
brute-force manner?
Nevertheless, there is a simple but inefficient
algorithm that is based on the following
observation about line segments making up the
boundary of a convex hull
a line segment connecting two points pi and pj of
a set of n points is a part of the convex hull’s
boundary if and only if all the other points of the
set lie on the same side of the straight line
through these two points
Brute-Force Strengths and Weaknesses
Strengths
wide applicability
simplicity
yields reasonable algorithms for some important problems
(e.g., matrix multiplication, sorting, searching, string
matching)
Weaknesses
rarely yields efficient algorithms
some brute-force algorithms are unacceptably slow
not as constructive as some other design techniques
Exhaustive Search
A brute force solution to a problem involving
search for an element with a special property,
usually among combinatorial objects such as
permutations, combinations, or subsets of a set.
Method:
generate a list of all potential solutions to the problem in a
systematic manner (see algorithms in Sec. 5.4)
evaluate potential solutions one by one, disqualifying
infeasible ones and, for an optimization problem, keeping
track of the best one found so far
when search ends, announce the solution(s) found
Example 1: Traveling Salesman Problem
Given n cities with known distances between
each pair, find the shortest tour that passes
through all the cities exactly once before
returning to the starting city
Alternatively: Find shortest Hamiltonian circuit
in a weighted connected graph
Example:
a b
c d
8
2
7
5 3 4
How do we represent a solution (Hamiltonian circuit)?
TSP by Exhaustive Search
Tour Cost
a→bcda 2+3+7+5 = 17
a→bdca 2+4+7+8 = 21
a→cbda 8+3+4+5 = 20
a→cdba 8+7+4+2 = 21
a→dbca 5+4+3+8 = 20
a→dcba 5+7+3+2 = 17
Efficiency:
Θ((n-1)!)
Chapter 5 discusses how to generate permutations fast.
Example 2: Knapsack Problem
Given n items:
weights: w1 w2 … wn
values: v1 v2 … vn
a knapsack of capacity W
Find most valuable subset of the items that fit into the
knapsack
Example: Knapsack capacity W=16
item weight value
1 2 $20
2 5 $30
3 10 $50
4 5 $10
Knapsack Problem by Exhaustive Search
Subset Total weight Total value
{1} 2 $20
{2} 5 $30
{3} 10 $50
{4} 5 $10
{1,2} 7 $50
{1,3} 12 $70
{1,4} 7 $30
{2,3} 15 $80
{2,4} 10 $40
{3,4} 15 $60
{1,2,3} 17 not feasible
{1,2,4} 12 $60
{1,3,4} 17 not feasible
{2,3,4} 20 not feasible
{1,2,3,4} 22 not feasible
Efficiency: Θ(2^n)
Each subset can be represented by a binary string (bit vector, Ch 5).
Example 3: The Assignment Problem
There are n people who need to be assigned to n
jobs, one person per job. The cost of assigning
person i to job j is C[i,j]. Find an assignment that
minimizes the total cost.
Job 0 Job 1 Job 2 Job 3
Person 0 9 2 7 8
Person 1 6 4 3 7
Person 2 5 8 1 8
Person 3 7 6 9 4
Algorithmic Plan: Generate all legitimate assignments,
compute their costs, and select the cheapest one.
How many assignments are there?
Pose the problem as one about a cost matrix:
Assignment Problem by Exhaustive Search
9 2 7 8
6 4 3 7
5 8 1 8
7 6 9 4
Assignment (col.#s) Total Cost
1, 2, 3, 4 9+4+1+4=18
1, 2, 4, 3 9+4+8+9=30
1, 3, 2, 4 9+3+8+4=24
1, 3, 4, 2 9+3+8+6=26
1, 4, 2, 3 9+7+8+9=33
1, 4, 3, 2 9+7+1+6=23
etc.
(For this particular instance, the optimal assignment can be found by
exploiting the specific features of the number given. It is:
C =
2,1,3,4
Final Comments on Exhaustive Search
Exhaustive-search algorithms run in a realistic amount
of time only on very small instances
In some cases, there are much better alternatives!
Euler circuits
shortest paths
minimum spanning tree
assignment problem
In many cases, exhaustive search or its variation is the
only known way to get exact solution
The Hungarian method
runs in O(n^3) time.
Vornoi diagram
The partitioning of a plane with points into
convex polygons such that each polygon contains
exactly one generating point and every point in a
given polygon is closer to its generating point
than to any other.
A Voronoi diagram is sometimes also known as a
Dirichlet tessellation. The cells are called Dirichlet
regions, Thiessen polytopes, or Voronoi polygons.
Voronoi diagrams were considered as early at 1644 by
René Descartes and were used by Dirichlet (1850) in
the investigation of positive quadratic forms.
They were also studied by Voronoi (1907), who
extended the investigation of Voronoi diagrams to
higher dimensions.
They find widespread applications in areas such as
computer graphics, epidemiology, geophysics, and
meteorology
CSE408
Measuring of input size &
running time
Lecture #3
Analysis of algorithms
Issues:
correctness
time efficiency
space efficiency
optimality
Approaches:
theoretical analysis
empirical analysis
Theoretical analysis of time efficiency
Time efficiency is analyzed by determining the number of
repetitions of the basic operation as a function of input size
Basic operation: the operation that contributes most towards
the running time of the algorithm
T(n) copC(n)
running time execution time
for basic operation
Number of times
basic operation is
executed
input size
Empirical analysis of time efficiency
Select a specific (typical) sample of inputs
Use physical unit of time (e.g., milliseconds)
or
Count actual number of basic operation’s executions
Analyze the empirical data
Best-case, average-case, worst-case
For some algorithms efficiency depends on form of input:
Worst case: Cworst(n) maximum over inputs of size n
Best case: Cbest(n) minimum over inputs of size n
Average case: Cavg(n) “average” over inputs of size n
Number of times the basic operation will be executed on typical input
NOT the average of worst and best case
Expected number of basic operations considered as a random variable under
some assumption about the probability distribution of all possible inputs
Example: Sequential search
Worst case
Best case
Average case
Example
Let’s consider again sequential search. The standard assumptions
are that (a) the probability of a successful search is equal to p
(0 ≤ p ≤ 1) and (b) the probability of the first match occurring
in the ith position of the list is the same for every i.
we can find the average number of key comparisons Cavg(n) as
follows. In the case of a successful search, the probability of
the first match occurring in the ith position of the list is p/n for
every i, and the number of comparisons made by the algorithm
in such a situation is obviously i. In the case of an unsuccessful
search, the number of comparisons will be n with the
probability of such a search being (1− p). Therefore,
Types of formulas for basic operation’s count
Exact formula
e.g., C(n) = n(n-1)/2
Formula indicating order of growth with specific multiplicative
constant
e.g., C(n) 0.5 n2
Formula indicating order of growth with unknown
multiplicative constant
e.g., C(n) cn2
Order of growth
Most important: Order of growth within a constant multiple as
n→∞
Example:
How much faster will algorithm run on computer that is twice
as fast?
How much longer does it take to solve problem of double
input size?
Values of some important functions as n
Conclusion
The efficiency analysis framework concentrates on the order of growth of
an algorithm’s basic operation count as the principal indicator of the
algorithm’s
To compare and rank such orders of growth, computer scientists use three
notations:(big oh), (big omega), and (big theta)efficiency
The efficiency analysis framework concentrates
on the order of growth of an algorithm’s basic operation count as
the principal indicator of the algorithm’s
To compare and rank such orders of growth, computer scientists
use three notations:(big oh), (big omega), and
(big theta)efficiency
CSE408
Asymptotic notations
Lecture #4
Asymptotic Notations
The efficiency analysis framework concentrates on the order of
growth of an algorithm’s basic operation count as the principal
indicator of the algorithm’s
To compare and rank such orders of growth, computer
scientists use three notations:(big oh), (big omega), and (big
theta)efficiency
O Notation
Example
Big omega Notation
Example
Theta Notation
Example
Asymptotic order of growth
A way of comparing functions that ignores constant factors and small
input sizes
O(g(n)): class of functions f(n) that grow no faster than g(n)
Θ(g(n)): class of functions f(n) that grow at same rate as g(n)
Ω(g(n)): class of functions f(n) that grow at least as fast as g(n)
Big-oh
Big-omega
Theta
Some properties of asymptotic order of growth
f(n) O(f(n))
f(n) O(g(n)) iff g(n) (f(n))
If f (n) O(g (n)) and g(n) O(h(n)) , then f(n) O(h(n))
Note similarity with a b
If f1(n) O(g1(n)) and f2(n) O(g2(n)) , then
f1(n) + f2(n) O(max{g1(n), g2(n)})
Establishing order of growth using limits
lim T(n)/g(n) =
0 order of growth of T(n) < order of growth of g(n)
c > 0 order of growth of T(n) = order of growth of g(n)
order of growth of T(n) > order of growth of g(n)
Examples:
10n vs. n2
n(n+1)/2 vs. n2
n→∞
L’Hôpital’s rule and Stirling’s formula
L’Hôpital’s rule: If limn→ f(n) = limn→ g(n) = and
the derivatives f´, g´ exist, then
Stirling’s formula: n! (2n)1/2 (n/e)n
f(n)
g(n)
lim
n→ = f ´(n)
g ´(n)
lim
n→
Example: log n vs. n
Example: 2n vs. n!
Exmaple
Example
Example
Example
Small Oh Notation
Small Oh Notation
Exmaple
Orders of growth of some important functions
All logarithmic functions loga n belong to the same class
(log n) no matter what the logarithm’s base a > 1 is
All polynomials of the same degree k belong to the same class: aknk
+ ak-1nk-1 + … + a0 (nk)
Exponential functions an have different orders of growth for
different a’s
order log n < order n
(>0) < order an < order n! < order nn
Basic asymptotic efficiency classes
1 constant
log nlogarithmic
nlinear
n log n n-log-n
n2quadratic
n3cubic
2nexponential
n! factorial
Copyright © 2007 Pearson Addison-Wesley. All rights reserved. A. Levitin Introduction to the Design & Analysis of Algorithms, 2nd ed., Ch. 1
CSE408
Fundamentals of Data
Structure
Lecture #2
Copyright © 2007 Pearson Addison-Wesley. All rights reserved. A. Levitin Introduction to the Design & Analysis of Algorithms, 2nd ed., Ch. 1
Fundamental data structures
array
linked list
string
stack
queue
priority queue/heap
graph
tree and binary tree
dictionary
Copyright © 2007 Pearson Addison-Wesley. All rights reserved. A. Levitin Introduction to the Design & Analysis of Algorithms, 2nd ed., Ch. 1
Linear Data Structures
Arrays
A sequence of n items of the same data
type that are stored contiguously in
computer memory and made accessible
by specifying a value of the array’s
index.
Linked List
A sequence of zero or more nodes each
containing two kinds of information:
some data and one or more links called
pointers to other nodes of the linked
list.
Singly linked list (next pointer)
Doubly linked list (next + previous
pointers)
Arrays
fixed length (need preliminary
reservation of memory)
contiguous memory locations
direct access
Insert/delete
Linked Lists
dynamic length
arbitrary memory locations
access by following links
Insert/delete
a1 ana2 .
Copyright © 2007 Pearson Addison-Wesley. All rights reserved. A. Levitin Introduction to the Design & Analysis of Algorithms, 2nd ed., Ch. 1
Stacks and Queues
Stacks
A stack of plates
insertion/deletion can be done only at the top.
LIFO
Two operations (push and pop)
Queues
A queue of customers waiting for services
Insertion/enqueue from the rear and deletion/dequeue from the
front.
FIFO
Two operations (enqueue and dequeue)
Copyright © 2007 Pearson Addison-Wesley. All rights reserved. A. Levitin Introduction to the Design & Analysis of Algorithms, 2nd ed., Ch. 1
Graphs
Formal definition
A graph G = <V, E> is defined by a pair of two sets: a
finite set V of items called vertices and a set E of vertex
pairs called edges.
Undirected and directed graphs (digraphs).
What’s the maximum number of edges in an undirected graph
with |V| vertices?
Complete, dense, and sparse graphs
A graph with every pair of its vertices connected by an edge
is called complete, K|V|
Dense graph is a graph in which the number of edges is
close to the maximal number of edges. Sparse graph is a
graph in which the number of edges is close to the minimal
number of edges.
Copyright © 2007 Pearson Addison-Wesley. All rights reserved. A. Levitin Introduction to the Design & Analysis of Algorithms, 2nd ed., Ch. 1
Weighted Graphs
Weighted graphs
Graphs or digraphs with numbers assigned to the edges.
1 2
3 4
6
8
5
7
9
Copyright © 2007 Pearson Addison-Wesley. All rights reserved. A. Levitin Introduction to the Design & Analysis of Algorithms, 2nd ed., Ch. 1
Graph Properties -- Paths and Connectivity
Paths
A path from vertex u to v of a graph G is defined as a sequence of
adjacent (connected by an edge) vertices that starts with u and ends with
v.
Simple paths: All edges of a path are distinct.
Path lengths: the number of edges, or the number of vertices 1.
Connected graphs
A graph is said to be connected if for every pair of its vertices u and v
there is a path from u to v.
Connected component
The maximum connected subgraph of a given graph.
Copyright © 2007 Pearson Addison-Wesley. All rights reserved. A. Levitin Introduction to the Design & Analysis of Algorithms, 2nd ed., Ch. 1
Graph Properties -- Acyclicity
Cycle
A simple path of a positive length that starts and ends a
the same vertex.
Acyclic graph
A graph without cycles
DAG (Directed Acyclic Graph)
1 2
3 4
Copyright © 2007 Pearson Addison-Wesley. All rights reserved. A. Levitin Introduction to the Design & Analysis of Algorithms, 2nd ed., Ch. 1
Trees
Trees
A tree (or free tree) is a connected acyclic graph.
Forest: a graph that has no cycles but is not necessarily connected.
Properties of trees
For every two vertices in a tree there always exists exactly one simple
path from one of these vertices to the other. Why?
Rooted trees: The above property makes it possible to select an
arbitrary vertex in a free tree and consider it as the root of the so
called rooted tree.
|E| = |V| - 1 1 3
2 4
51
3
2
4 5
rooted
Copyright © 2007 Pearson Addison-Wesley. All rights reserved. A. Levitin Introduction to the Design & Analysis of Algorithms, 2nd ed., Ch. 1
Rooted Trees (I)
Ancestors
For any vertex v in a tree T, all the vertices on the simple path
from the root to that vertex are called ancestors.
Descendants
All the vertices for which a vertex v is an ancestor are said to be
descendants of v.
Parent, child and siblings
If (u, v) is the last edge of the simple path from the root to
vertex v, u is said to be the parent of v and v is called a child of
u.
Vertices that have the same parent are called siblings.
Leaves
A vertex without children is called a leaf.
Subtree
A vertex v with all its descendants is called the subtree of T
rooted at v.
Copyright © 2007 Pearson Addison-Wesley. All rights reserved. A. Levitin Introduction to the Design & Analysis of Algorithms, 2nd ed., Ch. 1
Rooted Trees (II)
Depth of a vertex
The length of the simple path from the root to the vertex.
Height of a tree
The length of the longest simple path from the root to a leaf.
1
3
2
4 5
h = 2
Copyright © 2007 Pearson Addison-Wesley. All rights reserved. A. Levitin Introduction to the Design & Analysis of Algorithms, 2nd ed., Ch. 1
Ordered Trees
Ordered trees
An ordered tree is a rooted tree in which all the children of each vertex
are ordered.
Binary trees
A binary tree is an ordered tree in which every vertex has no more than
two children and each children is designated s either a left child or a right
child of its parent.
Binary search trees
Each vertex is assigned a number.
A number assigned to each parental vertex is larger than all the numbers
in its left subtree and smaller than all the numbers in its right subtree.
9
6 8
5 2 3
6
3 9
2 5 8
Copyright © 2007 Pearson Addison-Wesley. All rights reserved. A. Levitin Introduction to the Design & Analysis of Algorithms, 2nd ed., Ch. 1